Pronunciation Clustering and Modeling of Variability for Appearance-Based Sign Language Recognition
نویسندگان
چکیده
In the domain of sign language recognition from video, most approaches try to segment and track the hands and head of the signer in a first step and subsequently extract a feature vector from these regions [1, 2]. Because of possible occlusions between the hands and the head of the signer, noise, or brisk movements, segmentation can be difficult. Many approaches therefore use special data acquisition tools like data gloves, colored gloves, or wearable cameras. Furthermore, the words and phrases of sign language are expressed differently by different signers. Sometimes there are two or three different pronunciations for one word. The pronunciations differ in the visual appearance. In this work, we introduce a database of video streams for American sign language word recognition. The utterances are extracted from a publicly available database and can therefore be used by other research groups. This database, which we call ‘BOSTON50’, consists of 483 utterances of 50 words. One important property of this database is the large variability of utterances for each word. This database is therefore more difficult to recognize automatically than databases in which all utterances are signed uniformly. So far, this problem has not been dealt with in the literature on sign language recognition. To overcome these shortcomings we suggest the following novel approaches:
منابع مشابه
Robust appearance based sign language recognition
In this work, we introduce a robust appearance-based sign language recognition system which is derived from a large vocabulary speech recognition system. The system employs a large variety of methods known from automatic speech recognition research for the modeling of temporal and language specific issues. The feature extraction part of the system is based on recent developments in image proces...
متن کاملPerception and Synthesis of Biologically Plausible Motion: From Human Physiology to Virtual Reality
Temporal measures of hand and speech coordination during French cued speech production p. 13 Using signing space as a representation for sign language processing p. 25 Spatialised semantic relations in French sign language : toward a computational modelling p. 37 Automatic generation of German sign language glosses from German words p. 49 French sign language processing : verb agreement p. 53 R...
متن کاملMandarin Pronunciation Modeling Based on Cass Corpus1
The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems. In this paper, the factors that may affect the recognition performance are analyzed, including those specific to the Chinese language. By studying the INITIAL/FINAL (IF) characteristics of Chinese language and developing the Bayesian equation, w...
متن کاملModeling of Pronunciation, Language and Nonverbal Units at Conversational Russian Speech Recognition
The main problems of a conversational Russian speech recognition system development are variability of pronunciation, free word-order in sentences and presence of speech disfluencies. In the paper, pronunciation variability is modeled by creation of multiple word transcriptions. A syntacticstatistical language model that takes into account long-distant word dependencies is proposed for Russian ...
متن کاملModeling pronunciation variation using context-dependent weighting and b/s refined acoustic modeling
The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems. By studying the initial/final (IF) characteristics of Chinese language and developing the Bayesian equation, we propose the concepts of generalized initial/final (GIF) and generalized syllable (GS), the GIF modeling method and the IF-GIF modelin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005